Chapter 5 Results

Before we start doing the analysis, we want to show the current situation of spread of COVID-19 in the whole world since that will explain why COVID-19 is a world-wide serious problem. But instead of showing the information of all the countries, we selected six countries in different continent, which are China, India, Sweden, Russia, United Kingdom and United States.

The graph below shows the progress of total cases per million for selected countries: China, India, Sweden, Russia, United Kingdom, United States. The flattening curve means slowing down the rate of infections. This cumulative curve can help us see the cumulative change of infection cases every day. This graph shows that since the first confirmed case, the rate of epidemic growth has gradually increased. Although in about 400 days, the growth rate of these countries has shown a downward trend, but after a short period of time, the growth rate of the United States, Sweden and United Kingdom has increased again. This shows that the spread of the COVID-19 is difficult to suppress.

We also plot the changes in the number of new cases in this six countries after the first case was confirmed. This graph shows the changes in the number of new infections per day. We can see the number of new infection cases in each country in each time period and the comparison of the number of new infection cases in each country in the same time period. We can see a sudden increase in the number of new infections in India in about 400 days after the first confirmed cases. And for the United States, the number of new cases confirmed shows an increasing trend after the first confirmed case. Although there was a turning point at about 350 days after the first case, it increased again after about 100 days of the turning point. It also proved that COVID-19 is a very serious virus and difficult to control.

Since we are mainly focusing on the data in the United States, we want to show the current situation of spread of COVID-19 in the United States separately. All the graphs below are using the data collected on 2021-12-09.

The map below shows the distribution of confirmed cases in the United States. It indicates the cumulative confirmed cases in United States on 12/9/2021. Each red point represents the number of confirmed cases in a city, and the size of the point reflects the relative amount of the exact number. The higher the number, the larger the spot. Form this map, we can see that larger cities are having a larger number of cases.

To be more clear, we use the bar chart to show the current cumulative confirmed cases and number of death in 50 states. By comparing these two graphs, it shows that state with more confirmed cases generally has a higher number of death. Form these two graphs, we can see that California, Texas, Florida, New York, and Illinois are top 5 states with most confirmed cases. California, Texas, Florida, New York, and Pennsylvania are top 5 states with most number of deaths.

5.1 Impact of the vaccination on the spread of the epidemic

To show the impact of the vaccination and the spread of the epidemic, as there are 12 features in our vaccination data, we first want to find which features have the strong correlation with the number of cases. Therefore, we use a correlation heatmap to select the most relevant features to confirmed cases and deaths.

From the graph above, we can see the top 4 features with highest correlations are total vaccinations, total distributed, people vaccinated, and people fully vaccinated. We will use people vaccinated and people fully vaccinated as two features to represent vaccination. People vaccinated includes both one does vaccination and fully vaccinated data. And fully vaccinated feature only represents the number of people who are fully vaccinated. Below are the bar charts for these two features in 50 states.

Form the chart for People vaccinated, it indicates that California, Texas, Florida, New York, and Pennsylvania are top 5 states with most number of vaccination. Which is corresponding to the top 5 states with most number of deaths. This may because the population of these five states are large, which will cause them to become the five states with the highest number of both vaccination and deaths. However, none of them are on top of the list of people fully vaccinated. So it may indicate that one does vaccination can not effectively help to stop the spread of the epidemic, but fully vaccination can achieve this.

To show the relationship between people vaccinated, people fully vaccinated, number of cases and number of death more clearly, we plot the time series graphs for each features for the top 5 states mentioned before, which are California, Texas, Florida, New York, and Pennsylvania.

In the original graph, there is a significant outlier in people_vaccinated daily growth graph. That is because from October 2nd to November 28th, the people_vaccinated information in Pennsylvania is missing. Although we tried to fill it by previous data, there still exists a unreasonable huge pike. Therefore, we delete the data of that time period. But after the cleaning of data, we can see in the graph that as the number of vaccinations is in an increasing trend, the number of confirmed cases and death will growth in a decreasing rate. And when the number of vaccinations is keeping in a low number, although in the short run, the number of confirmed cases and death will also keep in a low increasing rate, in the long run, it will increase and even achieve a highest point.

To show this trend more clearly, instead of only focusing on the data of these top five state, we will use the data for the United States. The graph below shows the time series of confirmed cases growth and people vaccinated growth in the United States.

Form the graph, we can say the similar trend as before. The trend from Jan 2021 to April 2021 shows clearly that when people vaccinated number grew in a high speed, confirmed cases growth speed decreased. For the time period between April 2021 and July 2021, although the number of confirmed cases still decreased as the vaccinated number decreased, it was in a lower decreasing rate compared with the previous time range. This trend was proved in the later time range which is after July 2021. As the number of growth in vaccination was keeping in a low value, the number of confirmed cases turned to increase and reached its highest point in about September 2021. Therefore, this plot confirms that the increase in the number of vaccination will lead to a decrease in the number of confirmed cases, thus, it helps to prevent the COVID-19 spread.

5.2 Impact of the hospital capacity and inpatient rate on the death rate

In addition to the relationship between vaccines and COVID-19, we also want to see whether the capacity of the hospital and the inpatient rate will have a certain impact on the mortality rate of COVID.

We first plot a scatter plot to show the relationship between capacity of the hospital and the death rate. Because states with more populations will have more hospitals, in order to avoid the impact of different population in each state, we chose to divide the total number of beds in each state by the total population in each state to get the percentage of number of beds in population.

According to the graph, there is no clear trend between the percentage of beds in population and the death rate. Especially when we ignore the state which has more than 0.035. It’s different from what we initially imagined, when capacity of hospital in a state increase, the mortality rate will drop. It may cause by several reasons. The first reason may be due to inaccurate mortality. Because the symptoms of COVID-19 are similar to those of flu, many people will mistakenly think that they are flu instead of COVID-19, leading to deviations in statistical data. The second possibility is that there are other factors, such as the different in the number of ventilators or the different isolation rules in each state. These are also factors that can affect mortality. Therefore, although from this graph, there is no obvious relationship between the hospital capacity and death rate, we can not give a definite conclusion.

We then plot a scatter plot to show the relationship between hospitalization rate and the death rate.

Form the above graph, we can see that death rate increase as the hospitalization rate increases. There are two possible explanations. The first one is people would not go to hospital unless they are in very bad situation. Thus as the number of impatient increase, death rate increase. The second explanation is consider about the time range. There is no very useful way to cure COVID-19 patient in that time period. Thus there is a positive relationship between inpatient number and death rate.

5.3 Impact of COVID on life expectancy

To analysis the impact of COVID on life expectancy, instead of using the data collected form the United States, we used the data from the whole world. Below is the plots for life expectancy and total number of cases in each country.

We divide the countries in the data by each continent. Each country has only one life expectancy, so we draw the total cases per million of each country in each continent according to each life expectancy. We can see that the countries with higher life expectancy generally have more total cases per million. Since higher life expectancy generally represent that this country has more elderly, therefore, we roughly conclude that older people are more likely to infect the COVID-19.

To further confirm our conclusion, we also use the scatter plot to show the relationship between life expectancy and the total confirmed cases per million in each country. According to the graph above, we can see that there is a positive correlation between life expectancy and confirmed cases of COVID-19, which proved our conclusion before.

## 
##  One Sample t-test
## 
## data:  data_relation$total_cases_per_million
## t = 21.06, df = 2280, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  10398.08 12533.34
## sample estimates:
## mean of x 
##  11465.71
## 
##  One Sample t-test
## 
## data:  data_relation$life_expectancy
## t = 880.18, df = 2280, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  71.84826 72.16912
## sample estimates:
## mean of x 
##  72.00869
## 
##  Pearson's product-moment correlation
## 
## data:  data_relation$total_cases_per_million and data_relation$life_expectancy
## t = 23.453, df = 2279, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4072702 0.4734161
## sample estimates:
##       cor 
## 0.4409417

According to test table above, since p value is much smaller than the significance level, so, we can concluded that there do have a correlation between total cases per million and life_expectancy.